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Abstract 

Let (X, Y) be a R d x R-valued random vector. In regression analysis 
one wants to estimate the regression function m(x) := E(Y\X = x) from 
a data. In this paper we consider the rate of convergence for the k near- 
est neighbor estimator in case that X is uniformly distributed on [0, l] d , 
Var(Y|X = x) is bounded, and m is (p, C)-smooth. It is an open prob- 
lem whether the optimal rate can be achieved by some k nearest neighbor 
estimator in case of 1 < p < 1.5. We solve the problem affirmatively. 
This is the main result of this paper. Throughout this paper, we assume 
that the data is independent and identically distributed and as an error 
criterion we use the expected L2 error. 
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1 Introduction 

Let (X, Y) be a R d x R-valued random vector. In regression analysis, one wants 
to predict the value of Y after having observed the value of X, i.e. to find 
a measurable function / such that the mean squared error ~Exy (f{X) — Y) 2 
is minimized, where Exy denotes the expectation with respect to (X, Y). Let 
m(x) :— ~E{Y \X = x} (regression function), which is the conditional expectation 
of Y given X — x. Then m(x) is the solution of the minimization problem. In 
fact, one can check for any measurable function /, 

Exy {f(X) - Yf = E X y (m(X) - Y) 2 + E x (f(X) - m{X)f . 

In statistics, only the data is available, (the distribution of (X, Y) and m are not 
available), and one needs to estimate the function m from the data {(Xi, Fi)}" =1 , 
which are independently distributed according to the distribution of (X, Y). 
We wish to construct an estimator m„ of m such that the expected L2 error 



* t-ayano@cr.math.sci.osaka-u.ac.jp 



1 



R(m n ) := ExnynEx { m n{X) — m(X)) 2 is as small as possible, where Ex"y» 
denotes the expectation with respect to the data. In order to analyze the perfor- 
mance of estimators theoretically, it is very important to evaluate how fast the 
error R(m n ) converges to zero, when the data size n tends to infinity. In this pa- 
per we consider fc-NN (nearest neighbor) estimators and the rate of convergence 
in case that m is (p, C)-smooth (cf.Gyorfi et al., 2002, p. 37). 

The fc-NN estimator is defined as follows. Given x € M d , we rearrange the 
data (X\,Yi), . . . , (X n , Y n ) in the ascending order of the values of \\Xi — x\\. As a 
tic-breaking rule, if \\Xi — x\\ = \\Xj — x\\ and i < j, we declare that Xi is "closer" 
to x than Xj. We write the rearrange sequence by {Xi tX , Y\_ x ) , . . . , {X n ^ x , Y n _ x ). 
Notice that {(X itX , Y iiX )}? =1 is expressed by {(^(i)* ^7r(i))}™=i using a permu- 
tation 7r : {1, . . . , n} —¥ {1, . . . , n} depending on x € M d . Then for 1 < fc < n, 
the fc-NN estimator m n is defined by 



1 

m n (x) = t^2 y i, 



k . 

For the details about fc-NN estimators, for example, see Chapter 6 in Gyorfi et 
al. (2002). 

Let p, C > 0, and express p by p = q + r, g e Z> , < r < 1. We say 
that a function m : M d — > K is (p, C)-smooth if for all qi,...,qd G Z> with 
q = qi + ■ ■ ■ + qd, the partial derivatives Q x n. m Q x >id exist and for all x, z e R d 
the following is satisfied. 
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dxf ■ ■ ■ dxf y ' dxf ■ ■ ■ dx q J 
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For p,C,<r > 0, let T>(p,C,a) be the class of distributions of (X, Y) such 
that: 

(I) X is uniformly distributed on [0, 

(II) Vax(Y\X = x) < cr 2 ; 

(III) m is (p, C)-smooth, 

where Var(F|X = x) denotes the variance of Y given X = x. 

The lower bound for the class T>(p,C,cr) is known (cf.Gyorfi et al., 2002, 
p.38): 

liminfinf sup n 2p/{2p+d} R(m n ) > const. > (1) 

n^oo m n (x,Y)<EV(p,C,<j) 

where inf mji denotes the infimum over all the estimators. 

For < p < 1, the rate n ~ 2p /( 2p+d ) is achieved by the fc-NN estimator 
(cf.Gyorfi et al., 2002, pp.93,99): 

sup R{m n ) < const. n - 2 P/( 2 P+ d ) 
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For p > 1.5, it is shown that the rate n 2 P/( 2 P+ d ) is unachievable by any fc-NN 
estimator and it is presented as a conjecture that even for 1 < p < 1.5, the rate 
n -2 P /(2p+d) will be achieved by some fc-NN estimator (cf.Gybrh et al., 2002, 
p. 96). In this paper, we show that the conjecture is right (Theorem). Regres- 
sion analysis is used in many fields for example economics, medicine, pattern 
recognition etc. (cf.Gyorfi et al., 2002, pp. 4-9). Nearest neighbor estimators are 
very important in regression analysis. We have shown the performance of the 
nearest neighbor estimator theoretically. 

Throughout this paper we will use the following notations : R, R >0 , Z> , N 
are the sets of reals, positive reals, nonnegative integers and positive intgers. 
For a measurable set D C M d , vol(D) denotes the Lebesgue measure of D. 
For x e M d , 1 1 a; 1 1 denotes the Euclidean norm of x. For u, v e M d , we define 
H(u,v) := {w e R d | < \\v -u\\} and G(u,v) := H(u, v) n [0, l] d . For 

a > 0, [a\ denotes b such that b < a < b + 1. 

2 Related Work 

In this section, we overview the related work about consistency and the rate 
of convergence. For consistency, it was shown in Stone (1977) that the fc-NN 
estimators are universally consistent. Since then it was shown that many esti- 
mators share this property (cf.Devroye et al.,1994, Greblicki et al.,1984, Gyorfi 
and Walk, 1997, Kohler, 1999, Kohler and Krzyzak, 2001, Kohler, 2002, Lugosi 
and Zegcr, 1995, Nobel, 1996, Walk, 2002, Walk, 2005, Walk, 2008). For the 
rate of convergence, we know several results as follow: 

• Stone (1982) proved the lower bound (1); 

• for the distributions satisfying (II) (III) with < p < 1 and the parti- 
tioning, kernel, and fc-NN estimator, the rate n^ 2p ^ 2p+d ^ is achievable if 
X is bounded (for the fc-NN estimator, the condition d > 2p is required 
as well) (cf.Gyorfi, 1981, Gyorfi et al., 2002, Kulkarni and Posner, 1995, 
Spiegelman and Sacks, 1980); 

• Kohler et al.(2006, 2009) proved the same statement later without assum- 
ing that X should be bounded; 

• for the partitioning estimators and the class T>(p,C,<7) with p > 1, the 
rate n -2p/(2 P +d) is unachievable (cf.Gyorfi et al., 2002); 

• for the kernel estimators, the rate n~ 2p ^ 2p+d ^ is achievable for V(p, C, a) 
with < p < 1.5 and is unachievable for that with p > 1.5 (cf.Gyorfi et 
al., 2002); 

If we summarize the above results in Table 1, only the following problem 
remains: Does the fc-NN estimator achieve the rate n~ 2p ^ 2p+d ^ under (II) (III) 
even for 1 < p < 1.5 ? The problem is still hard, but we solve the statement 
affirmatively under (I) (II) (III). 

Table 1 : the achievability of n -2p/(2 P +d) for the estimators and V(p, C, a) 
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achievable 


unachievable 


partitioning 


<p < 1 


p > 1 


kernel 


<p < 1.5 


p > 1.5 


fc-NN 


<p < 1 


p > 1.5 



3 Main Result 

For p,C,a > 0, let T>(p, C, a) be the class of distributions of (X, Y) such that: 

(I) X is uniformly distributed on [0, l] d ; 

(II) V&r(Y\X = x) <ct 2 ; 

(III) m is (j?, C)-smooth, 

where Var(Y"|X = x) denotes the variance of Y given X = x. 

Then we get the following theorem: 

Theorem 

Let 1 < p < 1.5 and let m n be the fc-NN estimator with fc = Ln 2p/(2?,+d) J . 
Then there exists C\ > (which does not depend on n) such that 

sup E Xn y„E x (m n (X) ~ m(X) f < C 1 n~ 2p ^ 2p+d K 

{X,Y)eV{p,C,a) 

4 Proof of Theorem 

Suppose we are given X — x, X\ = x\, . . . , X n = x n . We take the expectation 
with respect to Y\, . . . ,Y n . Then the following bias- variance decomposition is 
well-known (cf.Gyorfi et al., 2002, p.94): 

E y ™ (m n (x) - m{x)f = Ey» ^ ^ (Y ifX - m(x))\ 

(l k V fl k V 

" ( J ( Y i,x " m ( x i,x)) ] + 7), (m^a) - m(z)) > 

4 + F{^ (roM_ra(l)) } • (v(II)) (2) 

We evaluate the second term of Let £j ::E = (x^j, . . . , xfl) and x = 
(x^ 1 - 1 , . . . ,xW). Let m s be the partial derivative of m with respect to the s-th 
component. Then by the mean-value theorem, there exists Ui £ M. d such that 
\\ui - x\\ < \\x i>x - x\\ and 

( k \ 2 ( k d \ 2 



E- 



i=l s=l 



4 



(the idea using the mean-value theorem is due to Gyorfi et al., 2002, p. 84) by 
Cauchy-Schwarz's inequality 



ft d ~> 2 



< 2 E Y^( m 'M- m >( x M x ii- xW ) + 2 E m >w EkS-^) 

L i=l s=l J ls=l i=l J 

k d d ( k ~| 2 

<2fc^^(m s ( u o-m s ( a; )) 2 (^;2-x^) 2 +2^™ s (x) 2 E(^-^ (S) ) . 

i=l s=l s=l I i=l J 

let L > such that max 1<s < c i,ii;e[o,i]' i |m s (a;)| < L, because m is (p, C)-smooth 
and \\ui — x\\ < \\xi tX — x\\, 

< 2kdC 2 J2 E - ^ p - 2 (^i ~ x {s) f + 2dL* £ (l>2 " * (S) )| 

1=1 S = l S=l I 1=1 J 

= 2kdc 2 j2\\ x i,x- x \\ 2p + 2dL2 J2\ , ~ " 2 



+2dL*ih Yl (4:l-* (s) )(4l- xis) ) 

s=l l<i^j<k 

We regard the random variables X, X\ , . . . , X„ and take the 

expectation with respect to X, X\, . . . , X n . 

ExE^y- (m n (X) - m(X)f 

< ^ + ^e xEx „ f: w x ^ x w 2p + ^? E * E *- E u*.* - *n 2 ( 3 ) 

i=i i=i 

+ ^e xEx „£ y, (4 S 1^ (S) )( X S-^ (S) ) w 

s=l \<i^j<k 

In order to evaluate the second and third terms in (j3)), the following propo- 
sition is available. 

Proposition (Gyorfi et al., 2002, pp.95,99) 

For any 7 > 0, there exists c\ > (depending on 7 and d) such that, 



±v x B X nJ2\\x iiX -x\\^ < Cl (£\ 

i=i ^ ' 



2 7 /d 



The proposition is proved originally for 7 = 1 in Gyorfi et al., 2002, but we 
have extended it to the general 7 > 0. We proceed to evaluate ((4]). 

Let D = {(x,xi,...,x n ) I \\xi - x\\ < \\x k+1 - x\\, i = 1, . . . , k, \\xj - x\\ > 
\\x k+ i -x\\,j = k + 2,...,n}. 
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Claim 1 

E X E X „ £ (*W-X«)(xW-*«) 

l<ijtj<k 

n- ■ ■ (n — k) 



(See Appendix for proof) 
From Claim 1, 

since xi,...,x k £ G(x, x k +i) and x k+2 , . . . , x n £ [0, l] d \G(x, x k +i) on D, 

g=l l<i^j<fc 



s=l l<i#j<fc 



n - ■ ■ (n — k) 



V V / da;/ cfefc+i / (a;j 8) - a^da* 



U2h.\ 

s=l l<i#j</c 



/ (xf -x^)^- •vol[G( a; ,x fe+1 )]' £ - 2 (l-vol[G(x,x fe+1 )])"- fe - 1 . 

JG(x,x k +i) 



Let VF := {(x,x fe+ i) | G(x,x fc+ i) 7^ iJ(x, x fc+ i)}. Since for (x,x k+1 ) $ W, 
f„, — x^)dxi = f„, — x^)d,Xi = 0, we obtain 

vol[G(a;,x fc+1 )] fe - 2 (l-vol[G( a ;,x fc+1 )])"- fe - 1 . 
Claim 2 There exists c 2 > (depending only on rf) such that 



(x^ } -X (s) ) dX! 
G(x,x k +i) 



<c 2 -vol[G(x,x fc+1 )]( d+1 )/ d 



(See Appendix for proof) 
From Claim 2, 



T<c 2 2 d n ' 2 '} n ^ f vol[G(x,x k+1 )] k+ i {l- V ol[G(x,x k+1 )]} n - k - 1 dx k+1 dx 

K — J w 

£ / w fe +3(l- u )"- fe - 1 d J F( u ) (5) 
! Jo 



n - • • (n — k) f 1 
k 2 (k-2) 
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where F(u) is the Lebesgue measure of S(u) := {(x,Xk+i) € W | < vol[G(x, Xfe+i)] < 
u} for < u < 1. 

Claim 3 

F(u)={ i=o I ( d - l ) e 2 (2d-i)e 2 " J v ^ 7 

u - 2de 2 J^\x^) d (l - 2x«) d - W 1 ) (|§ < u < l) 

where e 2 = vol[{y G K d | ||y|| < 1}], and e 2 < 2 d . 
(See Appendix for proof) 

For < u < e 2 /2 d and e 2 /2 d < u < 1, let /(it) := F'(it). (/(u) > 0) 

V(4rf-2i) J rf-i^H)^ _ a-ifi (-2)^ \ {d -4)/d ( 0<U< 21 
IW=< i=0 { (d-i)e 2 {2d-i)e\ " 



1 



® <M<1 ) 



Let /(0) = /(e 2 /2 d ) = /(l) = 0. There exists c 3 > (depending only on d) 
such that f(u) < c z u 1 / d , because, for e 2 /2 d < u < 1, /(u) = 1 < (2/e2 /d )w 1/d 
and for the other u it is trivial. 

For a e K >0 and /3 e N, let B(a,/3) := / u a_1 (l - u)^ -1 ^ (Beta func- 

Jo 

tion). Then the following formula is well-known: 

B(a, p) - T{a ^ - - & - ^ 



T(a + p) (a + P-T)---aT(a) a ■ ■ ■ (a + P - 1) ' 

where T is Gamma function. 
On the other hand, 

n! a r(n + l)-r(l + |) s 

hm o~r s- ■ n d = hm ~- • n d 

n-Kx>(l + 3)...( n+ |) „->«> T(n + 1 + f ) 

By Stirling's formula, 

= llm r( i + §) ^^>" ■ „3 = r<i + h 

Therefore, there exist C4, c 5 > (depending only on d) such that 

_3 n\ _3 

C4H d < 5- ; 5— < c^n ~z 

- + " 
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From ©, 

n---(n-k) /' 1 /i fc + 3 ..x„-fc_i 



(n- fc) 
fc 2 (fc-2)! " V'" ' * ' d '' 



c\c 3 d ;q/; v — ^r-B ( ft + 1 + -, n-k 



, ,n---(n — k) (n — k — 1)! 9 , ffc + l)---n 

c 2 c 3d^lT7r T ^ F TT < C2C 3 rf- 



fc2(fc-2)! (fc + 1+3). ..(„+!) - *° (A; + 1+ 3 ) ... (n+ a ) 

2 n! . ft! clc 3 c 5 d f k\ 3/d 

= ^(l + D-Cn + i/d + D.-^+i) " — UJ (6) 

Therefore, from (gj), ©, and Proposition, there exist C^Cs, C4 > 
(which do not depend on n) such that 

ExEx"y" <„„(*) - ra(X)f < £ + C 2 + § + ft (|)*" 

Assuming p < 1.5, if we set k = \n 2p ^ 2p+d ^ \ , there exists Ci > (which does 
not depend on n) such that 

E X E X » Y « (m n (X) - m(X)f < Cm" 2 ^ 2 ^ 

We have got Theorem. 

□ 

Appendix 
A Proof of Claim 1 

Let h G N := {1, ...,n} and I, J C iV\{/i} such that jjl = fc,7 n J = 
{}(empty),7 U J = AT\{/i}, where, j denotes the number of the elements. 
Let £>(/, J, /i) := {(a:, iCi, . . . , x n ) \ \\xt — x\\ < \\x h - x\\,i <E I, \\xj — x\\ > 
\\x h -x\\,j € J}. Sincevol{[0,l] d (" +1 )\U/.j i;i D(/, J,h)} = and for (7, J, h) ^ 
(/', J', fc/), D(J, J, ft) n £>(!', J', ft,') = {}, we have 



i<i^j<k 



= E / E - * (s) ) (4 s ' - x(s) ) dx ^--- dx ™ dx 

Since for each (J, J, h) the above integral has the same value and the number of 
(I, J, h) is n Ck ■ (n ~ k), we get Claim 1. 

□ 
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B Proof of Claim 2 

Let ei := / y dy and e2 := / dy, then for any R > 0, 

J\\y\\<l,yM>a J\\y\\<l 

[ yWdy = e x R d +\ 

J\\v\\<R, a< s »>o 

and 

dy = e 2 R d , (7) 



thus we have 



\y\\<R 



< / \x[ s) -x {s) \d Xl =2 / y {s) dy 



f(2 ei W(d+i) 

= 2ei||x - x k+1 \\ d+1 = ^ vol[ff(a:, a; fc+ i)] 



V 



C2 



If we prove the following lemma, the proof of Claim 2 is complete: 
Lemma 

There exists (depending only on d) such that for any u,v € [0, l] d , 
vol[G(u,v)] > e 3 vo\[H(u,v)] 

(Proof of Lemma) 

Suppose ||u - u|| < 1/2. Let I := {i | < w w < 1/2} and M := {w \ \\w - 
u\\ < \\u - v\\, > u w , i £ I, < j I}. Then, M C G(u,v) and 
vol[M] = 2~ d vo\[H(u,v)}, thus, 

vol[G(u, v)] > 2- d vo\[H(u, v)} (8) 

Suppose \\u — v\\ > 1/2. Since \\u — v 

II < Vd, we have vo\[H(u,v)] < e2<i d / 2 . 
From ©, for z <E M d such that ||u - z\\ = 1/2, 

vol[G(u, «)] > vol[G(u, «)] > 2~ d vol[If (u, «)] = 2" d (e 2 2" d ) . 



vol[G(w, »)] 



> 2- 2d d- d ' 2 



" vol[if («,«)] 
Let e3 := min{2 _d , 2~ 2d <i~ <i / 2 }, then we get Lemma. 



□ 
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C Proof of Claim 3 

Let 

Vi := \x £ [0,l] d | x {l) < mm{x (j \l- x {l) },j = l,...d\ 

V l+d := {x £ [0, l] d | 1 - asW < min{x (i) , 1 - j = l,...d} . 
Since U^Vj = [0, l] d and for i ^ j, vol[Vi n V,] = 0, by Fubini's theorem, 

F(u)= / dx k+ i dx = I I j dx k+ i { dx = y2 I dx k +\ \ dx 
Js{u) J[0A] d [Js(u) J i^iJVi \Js(u) J 

Without loss of generality, we assume x £ V\ . 

Let y := (O,^ 2 ),...,^). Since H(x,y) C [0, l] d , for u < vol[H(x,y)], 

< vol[G(ar,arfc + i)] < u =» G?(x,a;fc + i) = F(x,a; fc+ i) => (x,x fe+ i) ^ W 

=> {x k+ i | (x,x fc+ i) £ = {} 

For vo\[H(x, y)\ < u and z £ M. d such that vol[G(a;, z)] = u, we have 

{x k+ i | (.x,.x fe+ i) e S(u)} = G(x,z)\H(x,y) 

and from 0, vo\[G(x, z)\H(x,y)} = u - e 2 {x^) d . 

U := I < dx k+ i > dx = / max{u - e 2 (x (1) ) d , 0} dx 

JVx \JS(u) J JVi 

n{(u/e2) 1/d ,l/2} f 



min{(M/e 2 ) 1/£! ,l/2} 







/ dx^-- 


■ / dx& 


1 JxW 


JxW j 



II 



(u-e 2 (x^) d ) {l-2x^) d - 1 dx^ 



For (u/e 2 ) 1/d < 1/2, i.e. < u < e 2 /2 d , 

K«/e2) 1/d 



u = 



(u-e 2 (x«) d ) (1 - 2x( 1 >) d -W 1 ) 



(u/e 2 f' d (d-1 \ 

(u - e 2 (*W) d ) J2 11 (-2^ (1) ) rf - 1 - 4 dx^ 



. i=0 



=S l (rf-«)4 d " )/<i {2d~i)e (d - l)/d 



u (2d-i)/d 
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For 1/2 < (ii/e 2 ) 1/d , i.e. e 2 /2 d <u<l, 

/•1/2 . . 
U= / (u-e 2 {x^) d ) {l-2x^) d - 1 dx { 



i) 



Jo 




u 



,1/2 

Jo 



(1 - 2z«) d -W 1 ) 



Now we have got Claim 3. 



□ 
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